智能论文笔记

Découvrir de nouvelles classes dans des données tabulaires

Colin Troisemaine , Joachim Flocon-Cholet , Stéphane Gosselin , Sandrine Vaton , Alexandre Reiffers-Masson , Vincent Lemaire

分类：机器学习

2022-11-28

In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNCD, a new method for discovering novel classes in tabular data. We show a way to extract knowledge from already known classes to guide the discovery process of novel classes in the context of tabular data which contains heterogeneous variables. A part of this process is done by a new method for defining pseudo labels, and we follow recent findings in Multi-Task Learning to optimize a joint objective function. Our method demonstrates that NCD is not only applicable to images but also to heterogeneous tabular data.

translated by 谷歌翻译

在新颖的类发现（NCD）中，目标是在一个未标记的集合中找到新的类，并给定一组已知但不同的类别。尽管NCD最近引起了社区的关注，但尽管非常普遍的数据表示，但尚未提出异质表格数据的框架。在本文中，我们提出了TabularNCD，这是一种在表格数据中发现新类别的新方法。我们展示了一种从已知类别中提取知识的方法，以指导包含异质变量的表格数据中新型类的发现过程。该过程的一部分是通过定义伪标签的新方法来完成的，我们遵循多任务学习中的最新发现以优化关节目标函数。我们的方法表明，NCD不仅适用于图像，而且适用于异质表格数据。进行了广泛的实验，以评估我们的方法并证明其对7种不同公共分类数据集的3个竞争对手的有效性。

translated by 谷歌翻译

本文涉及来自神经网络研究的一些非线性随机矩阵集合的最大特征值的渐近分布。更确切地说，我们考虑$ m = \ frac {1} {m} yy ^ \ top $ w $ y = f（wx）$ worth w $和$ x $ with w $和$ x $是随机矩形矩阵。以中心的条目。这模拟了单层随机馈通神经网络的数据协方差矩阵或共轭内核。函数$ F $应用于entryWish，可以被视为神经网络的激活功能。我们表明，最大的特征值具有与某种众所周知的线性随机矩阵集合相同的极限（概率）。特别是，我们将非线性模型的最大特征值的渐近极限与信息 - 正噪声随机矩阵的渐近极限相关联，根据函数$ f $和$ w $和$ x的分发建立可能的阶段转换$。对于机器学习来说，这可能是有意义的。

translated by 谷歌翻译